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A generalization of majorization that characterizes 

Shannon entropy 

Markus P. Muller and Michele Pastena 


Abstract —We introduce a binary relation on the finite discrete 
probability distributions which generalizes notions of majorlza- 
tlon that have been studied in quantum information theory. Mo¬ 
tivated by questions in thermodynamics, our relation describes 
the transitions Induced by blstochastlc maps in the presence of 
additional auxiliary systems which may become correlated in the 
process. We show that this relation is completely characterized 
by Shannon entropy H, which yields an Interpretation of H in 
resource-theoretic terms, and admits a particularly simple proof 
of a known characterization of H in terms of natural information- 
theoretic properties. 

I. Introduction 

M ajorization and its relation to entropy plays a 
crucial role in many areas of probability and infor¬ 
mation theory m. A discrete probability distribution p — 
(pi,... ,Pn) is said to majorize another probability distribution 
q = {qi,.. .,qn), denoted 

P 9, 

if and only if there is a bistochasticQ map d) such that 
q = $(p). The bistochastic maps are exactly the convex 
combinations of permutations; therefore, g is a random mixture 
of reshufflings of p and in this sense more disordered than p. 

Since disorder and entropy are recurrent themes in thermo¬ 
dynamics, it comes as no surprise that majorization plays a 
major role there as well. In particular, bistochastic maps and 
the majorization relation have been shown to determine the 
thermodynamically allowed state transitions of systems out 
of equilibrium in the absence of energy constraints El, a, 
i). Similarly, majorization has been shown to determine the 
interconvertibility of entangled pure quantum states by local 
operations and classical communication a, a. 

Mathematically, these applications have led to the study of 
majorization in the context of joint distributions of several 
random variables, in particular product distributions El, ||8l, 
a. These appear naturally in the context of resource theo¬ 
ries a, where random variables represent physical systems, 
and one asks how certain allowed transformations (such as 
bistochastic maps) are able to interconvert a given state of a 
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physical system into another one. The interplay of the states 
of several physical systems is of obvious interest, with the 
intuition that sometimes the presence of one physical system 
(say, a battery) can help to perform state transitions on another 
physical system (say, a laser pointer). 

In this paper, we introduce a multipartite notion of majoriza¬ 
tion which is meant to elucidate the relation between disorder 
and correlation. In a nutshell, while majorization determines 
whether a transformation p ^ q is possible via bistochastic 
maps, we study transformations of the form 

p 0 (ri 0 ... 0 rfe) 9 ® (1) 

which map p to q, but at the same time correlate k auxiliary 
systems without changing their marginals. Here, ® denotes the 
Kronecker product, i.e. p (gi r is a product distribution on two 
systems; denotes a joint probability distribution on k 

systems, with marginals ri,...,rfc. Given two distributions 
p and q, we ask whether there exists some /c G Nq and 
ri such that transition O is possible via some bistochastic 
map. We can also fix a given value of k, in which case O 
generalizes the notions of majorization (fc = 0) and trump¬ 
ing 111 (A: = 1) that have been extensively studied in quantum 
information theory. 

In the case where both p and q do not contain zeros, and 
are not identical up to permutation, we show in Theorem [T] 
below that a transformation of the form O is possible if 
and only if H{p) < H{q), for H the Shannon entropy. 
Thus, the possibility or impossibility of transitions of the 
form O is completely characterized by Shannon entropy. 
Furthermore, this insight can be used to give a particularly 
simple proof of a version of a known characterization of 
Shannon entropy: Aczel et al. ifTOl Lemma 5] have shown that 
H is the unique real function (up to additive and multiplicative 
constants) on the probability distributions without zeros which 
is symmetric, additive, and subadditive. If we additionally 
assume continuity, then Theorem [T] yields this characterization 
of iF as a simple corollary, cf. Corollary [3] 

While the detailed thermodynamic interpretation of O 
has been discussed elsewhere M, the main idea can be 
phrased in the language of resource theories: if p q such 
that p cannot be transformed into q by bistochastic maps, 
but nevertheless H{p) < H{q) such that ([T]| is possible, 
then stochastic independence is used as a resource a, m 
in the transition. In other words: the additional creation of 
correlations ri 0 ... ® —>■ in the auxiliary systems 

enables the otherwise impossible transition p ^ q. This is 
comparable to the situation in Landauer’s principle ifTTIl . where 
the erasure of one bit of information, (i, i) -A (1,0), can be 
accomplished at the additional expense of energy. 
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This paper is organized as follows. In Section |II] we 
give precise mathematical definitions and formulations of our 
results, Theorem [1] and Corollary [3] Furthermore, we explain 
how the results fit into the context of previous research on 
majorization and characterizations of entropy, and explain 
some results and open problems related to the value of k in ([T]|. 
In Section [1111 we give a proof of Theorem [T] which is ac¬ 
complished by construction of a suitable auxiliary distribution 
ri,...,k (however, with several non-trivial twists). Section |IV] 
shows how Corollary [3 follows as a simple consequence. 
We conclude with Section |Vl where we argue that our new 
relation ([T]) may be a special case of a wide variety of interest¬ 
ing generalizations of majorization, characterizing transitions 
under consumptions of different kinds of information-theoretic 
resources. 

II. Main results and their context 

In this paper, we are only considering finite discrete prob¬ 
ability distributions. That is, in what follows, a probability 
distribution is a vectorp = {pi,... ,pn) G M" for some n G N 
with the property that all pi>0 and Pi — 1- have a 
bipartite probability distribution, i.e. a joint distribution of two 
random variables A and B, then we denote this distribution 
by PABi and its marginals by pA resp. pB. In the case of 
k > 2 random variables, we also use the notation for 

the joint distribution, and pi for its marginal on the i-th random 
variable, which should not be confused with the 2 -th entry of 
a vector p. The largest fixed number of systems or random 
variables that we consider explicitly will be five, which we 
denote by A., B, C, D, E. 

Majorization is defined in the following way. If p,q € M"* 
are probability distributions, therQ 

k k 

p>- q ^ '^pi k = l,... ,m, (2) 

where pf = (p|,... ,pf) denotes the reordering of the entries 
of p in descending order, i.e. p\ = P 7 r(i) for some permutation 
TT such that p\ > P 2 > ... > Pm- This is equivalent |[T] to 
the existence of a bistochastic map $, i.e. a linear map on 
R™ with 1)T = (1,...,!)^, mapping probability 

distributions to probability distributions, such that $(p) = q. 
Maps $ of this kind are represented by bistochastic matrices, 
i.e. square matrices with non-negative entries and row and 
column sums equal to one. Given any probability distribution 
p G R.™, we define the rank of p as the number of non-zero 
entries of p. That is, 

rank(p) := #{i | p* ^ 0}. 

Furthermore, the Shannon entropy of any probability distribu¬ 
tion p G K™ is defined as 

771 

H{p) ■■= - '^p^\ogp^, 

i=l 

^If majorization is defined for arbitrary vectors p,q G K"*, one has to 
add the additional constraint SliiPt = Since we are only 

considering probability distributions here, this condition is automatically 
satisfied and does not have to be specified. 


where OlogO := 0 by definition, and log denotes the natural 
logarithm, i.e. exp(loga:) = x. 

With this notation at hand, we are ready to state our main 
result: 

Theorem 1: Let p,q G R™ be probability distributions with 
pi ^ qr-l. Then there exists fc G Nq and a fc-partite probability 
distribution ri 2 ,...,fe such that 

p (g) (ri (g> r2 0 ... ® r/c) ^ g 0 ri,2...,fc (3) 

if and only if rank(p) < rank(q) and H{p) < H{q). 
Moreover, we can always choose fc = 3. 

Note that if p^ = q^, then g is a permutation of p, so p >- q, 
and 0 is trivially true (with k = 0). If H{p) = H{q) and 
pi ^ qi^ then, strictly speaking, a transition of the form ([T]| 
is impossible. In this case, however, one can find full-rank 
approximations q' that are arbitrarily close to q and that satisfy 
H{q') > H{q) = H{p), such that Q holds for q replaced 
by q', allowing to obtain q to arbitrary accuracy from p via 
transitions of the form ([T]i. 

We now discuss the special cases of Q for different values 
of k, summarized also in Table U 

If A: = 0, then Q reduces to majorization itself. If we 
demand that Q holds for fc = 1, we ask for some distribution 
r such that 

p®r>q®r. (4) 

This notion has been introduced in entanglement theory 
and is called trumping. That is, p trumps q, denoted p q, 
if and only if there is some distribution r such that (|4| holds. 
If p ^ g but p >~T q then the auxiliary distribution r acts 
like a “catalyst”. The interpretation is similar to a catalyst in 
chemistry: it enables transitions p —)■ g that are impossible 
without its presence, but it is not consumed and can be reused 
after the process. 

Motivated by this nomenclature, we call our new relation 
correlated trumping, or c-trumping and say that p c-trumps g, 
denoted p q, if and only if there exists fc G No and ri 2 ,...,fc 
such that Q holds. As stated in Theorem[T] the case fc = 3 is 
equivalent to leaving fc arbitrary, i.e. equivalent to c-trumping, 
and so is any fixed value fc > 4. 

Understanding the case fc = 2 remains an interesting open 
problem. We conjecture that fc = 2 is equivalent to c-trumping, 
too, but have not been able to prove this0 An example of c- 
trumping with fc = 2 auxiliary systems can be found in ifTTIl . 
though in a more general framework in which systems are 
allowed to carry Hamiltonians (energy). Using the construction 
of Theorem 3 in the Supplemental Material of ifTTl . one can 
obtain a pair of (high-dimensional) probability distributions 
p, g from that example, such that p )/-t g, but p 0 ri ® r 2 >- 
g 0 ri 2 for a suitable auxiliary distribution ri 2 , and thus p q- 
While fc = 2 is sufficient for this particular choice of p and g, 
we do not know whether it is in all cases. 

For any two given distributions p, g G R'", one can check 
directly whether p g by using the definition of majorization, 

^We currently need k = 3 catalysts in the proof of Theorem [T] for the 
following reason: since Renyi entropies Ha with 0 < a < 1 behave very 
differently from those with 1 < a < 00 , the auxiliary distribution ri is 
constructed in two steps, yielding a tripartite distribution. 
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TABLE I 

Different MAJORIZATION-LIKE RELATIONS arising as special cases of ([3]|. 


Case in (3) 

notation 

name 

complete set of monotones 

k = 0 

>- 

majorization 

partial sums Sj.(p) := ELi Pt 
(fe = 1,..., m — 1) if p g R"* 

k = l 

yr 

trumping 

Renyi entropies Ha (q € M \ {0}) 
and Burg entropy //Burg 

k = 2 

7 

- 

7 

k = 3 

yc 

c-trumping 

Shannon entropy H 
and Hartley entropy Hq 

k>A 

same as = 3 

” 

” 


(|2]l. In contrast, the trumping relation p )^t q is defined 
implicitly via the existence of a catalyst r satisfying (|4]i which 
cannot be checked directly. Thus, it has been an open problem 
for some time to give necessary and sufficient conditions that 
allow one to decide whether or not p>-t q holds. 

This problem has been settled in the works of Klimesh Cl 
and Turgut IS) . To understand their criterion, we need to define 
the Renyi and Burg entropies which will play a major role later 
on in the proofs as well. For probability distributions p € K™ 
and real parameters a G K.\{0,1}, we define the Reny; entropy 
of order a as 

(aeK\{0,l}). 

i—l 

Furthermore, we set 

Hocip) := -logmaxpi, iT_oo(p) := logminp^, 

i i 

Hi{p) := H{p), iJob) logrank(p). 

This choice of definition ensures continuity of Ha in a except 
at a = 0, in the sense that 

lim Haip) = Haoip), lim Haip) = Hiip), 

a—¥oo a—>^1 

lim Haip) = -ff-oo(p), lim Haip) = Hoip). 

a^ — oo a\,0 

However, lima/^o Haip) exists only if p has “full rank”, i.e. 
rank(p) = m, in which case it equals — logm = —Hoip). 
The Burg entropy IITtI is defined as 

^ m 

-ffBurg(p) := — Vlogpi. 

i—l 

Sometimes different conventions are used in the litera¬ 
ture El; the prefactor 1/m ensures that Tfeurg is additive, 
i.e. HBnrgip 0 <?) = -ffBurg(p) + TfBurg(<?)- Note that TfBurg 
and Ha for a < 0 attain the value — oo if p contains any 
zeros. Ho is also known as Hartley entropy or max entropy. 

These entropies characterize the trumping relation as fol¬ 
lows. 

Lemma 2 (Trumping /(T^, jl^): Let p,q G K.™ be probability 
distributions such that p^ q^, and such that at least one of 
them has full rank. Then py-T q and only if 

Haip) < Haiq) for all a € M \ {0}, and 

HBnrgip) < HBurgiq)- 

Thus, fixing different values of fc in © naturally gives rise to 
different notions of entropy that characterize the corresponding 


relations. A summary is shown in Table U Given some relation 
y' on the probability distributions, we say that a real function 
S' is a monotone if p y' q => Sip) < Siq). A set of monotones 
(Si)ig/ will be called complete for the relation if Si/p) < 
Siiq) for alH G / implies that p q (whether one would like 
to have strict or rather non-strict inequality, Siip) < Siiq), 
may depend on the context, and does so in Table |I]l. Thus, 
Lemma |2] can be understood as saying that the Renyi and 
Burg entropies constitute a complete set of monotones for the 
trumping relation. Similarly, Theorem[T]says that the Shannon 
and Hartley entropies are a complete set of monotones for c- 
trumping. 

Our second result is an immediate consequence of Theo¬ 
rem [T] As mentioned above, while the result in Corollary [3 is 
not new (a slightly stronger version has been proved in ifTOl l. 
our proof seems to be considerably simpler once Theorem [T| is 
established. Denote the probability distributions without zeros 
by 


■= S (FD ■ • ■ yPn) e 


p^ > 0,'^p^ = 1 


1 J 

and set A+ := A+. Then we have the following; 

Corollary 3: A continuous function S : A+ —K satisfies 
the following three properties 


(i) symmetry: if p,q G A+ are such that pi = q.,^(i) for 
some permutation tt and all z, then Sip) = Siq)', 

(ii) subadditivity: SipAs) < SipA^Ps) for every bipartite 
probability distribution pab G A+ with marginals pA 
and pb', 

(iii) additivity: SipA®PB) = 5'(pa)+-S'(pb) forallp^,pB € 
A+ 


if and only if it is of the form 


Sip) = c - Hip) + Cn for all p G A+, n gN, (5) 


where Hip) = — 'Y/iPi logpi is Shannon entropy, c > 0 some 
constant, and c„ G K. is some dimension-dependent constant 
with Cmn — Cm -f Cn- 

There is a vast literature on characterizations of Shannon 
entropy, see e.g. El, El, El, El- Our result is a slightly 
weaker version of the characterization in El Lemma 5], 
which does not presuppose continuity, and (in addition to 
symmetry and additivity) only assumes weak subadditivity, 
that is (ii) in the special case that B has dimension two. It 
turns out that Theorem [T] admits a straightforward proof of yet 
another version of Corollary [3 which characterizes functions 
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of the form Q as those that satisfy Schur concavity, additivity, 
and subadditivity on A+, without assuming continuity. Schur 
concavity of S means that q = <i>(p) for some bistochastic 
map $ implies S{q) > S{p), which is a property that one 
would intuitively expect from any “measure of disorder”. 
However, since the proof is somewhat more involved than that 
of Corollary [3l and since the result follows directly from those 
in Eol, we omit the details. 


III. Proof of Theorem[T] 

We start by fixing some notation. We say that a function 
f : I ^ M. with / C K is increasing if x < y f{x) < 
f{y) for all x,y G I, and that it is strictly increasing if 
X < y => f{x) < f{y) (analogous definitions apply to 
decreasing / strictly decreasing). We will use the elementary 
limit identity Q 

1 — cx 

Hsurgip) + logm = lim - (Ha{p) - logm) 

a\,0 a 

1 — cy 

= lim - {-Ha{p) - logm) .(6) 

a/'O a 


Furthermore, note that Renyi entropy satisfies 


f [0,logm] if a > 0 
[—cx), —logm] if a < 0, 


(7) 


and for every a 0, the maximal value sgn(Q;) log m is 
attained if and only if p = cf. IJOj. The 

corresponding statement for the Burg entropy is FfBurg(p) < 
— logm, with equality if and only ifp = 

In the following, we will deal with multipartite (mostly 
bipartite) probability distributions. In the bipartite case, we 
use the following notation. We denote the first system by A 
(of size m G N), and the second by B (of size n G N). 
Joint distributions on AB will be denoted as matrices with 
entries (pAB)i,j '■= p{a = i,b = j). For example, if 
p = PA = {pi, ... ,Pm) and q = qB = (gi,..., g„), then 


/ Piqi 

Pi 92 

Pi 93 ■ 

• pign \ 

P2qi 

P2q2 

P293 ■ 

• P29n 

\ Pmqi 

Pm 92 

Pm 93 ■ 

■ Pmqn ) 


In general, the marginal distributions on A resp. B can be 
obtained by summing over the rows resp. columns of pab- 
There is a specific family of bipartite probability distributions 
that will be important in what follows. If we have any 
probability distribution q = qA = {qi,...,qm) G M"*, we 
consider the specific extension 


qAB 


/ gi - oi 
92 - 02 


ai 

Ql 

n 

n 

02. 

02. 

n 

n 


02 . 

n 


( 8 ) 


V 


Qm 


n 


am 

n 


for any choice of G [0, qi] and n G N. This is an m x (n+1) 
matrix, and a bipartite probability distribution with marginal 


qA on A (which is what the word “extension” means here). 
Clearly 

m 

qb = (l — a, — where a = a^. 

\ n nJ 

We need two lemmas. The first one is as follows. 

Lemma 4: Let p,q G M"* be probability distributions such 
that q has full rank, H{p) < H{q), and q ^ 

Then there exists some <5 G (0, min^ g^) and N gN such that 
for Qi := qi — 6 and qAB as in (l8]l, the following statement is 
true for all n> N: 

Ha{pA Cl gs) < Ha{qAB) for all a G [1, +oo]. 

Proof: Note that p because iT(p) < 

iF(g) < logm. In the following, we will always assume that 
a > 1, a G M (unless stated otherwise). With the given choice 
of Oi, we get a = Qi = 1 “ "tnb. Consider the following 
expression; 

:= H^Qab) - H^iQB) - H^iPA) 

_ 1 ^ rnS°‘ + J2iLi (g» ~ 

1 — a (Sill Pf) + (1 — m6)°‘n^~°‘) 

We use the expression on the right-hand side to define a(i“^ 
also for non-integer n > 1. We have to show that this 
expression is positive for all a if n is large enough. In fact, 
in the limit, 

lim Afo^ = logm — Ha{p) > 0 for all a > 1, (9) 

n^oo 

which is however only a pointwise statement. We furthermore 
need the fact that 

Afo^ is strictly increasing in n if a G (l,oo). (10) 

We prove this by checking that (SHi Pi) ((^ “ a) Ai“^^ 
is strictly increasing in n} This expression is of the form 
f{x) := (a -I- bx)/{c + dx) for x := where a = mS°‘, 

b = Q ~ and d = (1 — md)°‘. We 

have f'{x) > 0 if and only if ad < be, which (after some 
simplification) is equivalent to Ha ^ < logm, 

and this inequality is satisfied since g is not the uniform 
distribution and because of (|7]), proving (fTOl) . 

Furthermore, for a = 1, we have 

:= H{qAB) - H{qB) - H{pa) 

^ 8 

= m5\ogm-^{qi-5) log H{p a), 

and this expression is independent of n. Since lim^x^o ^ = 
H{qA) — H{pa) > 0, there exists some <5 G (0,mini qf) such 
that with this choice of 5, we have A^^^ > 0. So let us choose 
and fix this 5 for all that follows. By continuity, for n = 1, 
there exists some e > 0 such that > 0 for all 1 < a < 

1 + e, and due to (fTOl) 

Afol > 0 for all n G N and l<a<l-|-e. (11) 
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Furthermore, if n is large enough, then we have the exact 
equality 

:= ffooigAs) — ffoo(qB) — Hoo{pa) 

= logm - H^{p) > 0. 

Applying Lemma |6] below to the family of functions a i—>• 
on the interval [1 + e, oo] (while taking into account (|9]l 
and (fTol) ') shows that there exists some N such that for all 
n > TV, we have > 0 for all a in that interval. Together 
with (fTTT i. this proves the claim. ■ 

The second lemma which now follows is interesting in its 
own right. It gives a partial answer to the question under 
which conditions we can have a different kind of “correlated 
trumping relation”: instead of asking whether a transformation 
Pa ® fB gA ® I’B is possible (corresponding to >-t)j 
one might allow that correlations between the two systems 
build up, such that AB is finally described by a correlated 
distribution gAB with marginal gs — tb- In this sense, the 
“catalyst” would be retained in its original form, but correlated 
with the system that is to be transformed. 

An example is given by the two distributions pA — 
and gA = It is easy to see that 

PA V' gA (from the definition of majorization) and pA '^t gA 
(since Ha (pa) < Ha {gA) for all a > 1 but not for 
all a < 1). However, if gAB is the correlated distribution 
in ([8]l with n = 1 and then it turns out that 

PA ® gB >T gAB, as one can check by using Lemma |2] 
That is, there exists an additional system C and a distribution 
sc such that pA® {gB ® sc) >- gAB 0 sc- If we denote the 
composite system BC by B' and set gAB' '■= gAB ® sc, then 
we have pa® gB' >- gAB'- This example is a special case of 
the following result: 

Lemma 5: Let p,g € R™ be probability distributions such 
that g has full rank, g ^ ..., and Ha{p) < Ha{q) for 

all a G [1, +oo]. Then there exists some a G {0,m ■ min^ gi) 
and N gN such that for gAB as given in (|8ll with ai := a/m, 
we have 


PA ® gB gAB for all n> N. 


Proof: First consider the case that p has full rank. Note 
that p 7 ^ ^ A) since Hi{p) < Hi{q) < logm. We will 

use the criterion in Lemma |2] to prove trumping. It holds 


HBnrgigAB) — 


m 


(n + 1) ^ 

' ' 1 — 1 

^ m 

HBurgiPA 0 gs) = — logPi + 


n , a 

+ —^log-, 

n + 1 mn 


log(l-a) + nlogf 


n + 1 


It is then elementary to see that the inequality HBurg{PA ® 
gB) < HBuYgiqAB) is equivalent to 


1 /I \ 

— logpi + n — logp* + log m + log(l - a) 

m \ m I 


{*) 


< 


- m 

m \ mJ 


Since /^YliLi'^ogpi = -Heurgb) < - logm, the factor (=i=) 
is negative. Hence this inequality is true if n is large enough; 
in other words, there exists N{a) G N (which may depend on 
the choice of a) such that 


HBurg{PA®gB) < HBurgigAB) for all u > N{a). (12) 
For all a G [—cxc, +c»], define the quantity 

Al“) := Ha{gAB) - Haigs) - HaiPA). 


If a = 0 this equals 0; for general finite a ^ {0,1}, it is 

X (c) ^ sgn(a) E" 1 ig^ - 
- 1 - a ^ (EEi P?) ((1 - a)“ + ' 


First we prove the following: 


A(“) 


is 


eventually constant in n 
increasing in n 
< constant in n 
decreasing in n 
eventually constant in n 


if a = —00 
if — oo < a < 1 
if a = 1 
if 1 < a < +00 
if a = +00. 


(13) 

By “eventually constant”, we mean that there is some N gN 
such that for all n > N, we have a 1“^ = A^\ This is the 
case for a = —oo and a — +oo, because in this case, all 
entropies only depend on the minimal resp. maximal entries 
of gAB resp. gs', if n is large, the location of these extrema 
is fixed, and direct calculation shows that all n-dependency 
cancels out. The special case a = 0 is trivial; for a = 1, 
direct calculation shows that 


Ad) 

*—*'n 


m 

“ E ( 9 * - — ) log ( 9 * - — ) 

\ m/ V m/ 

i—l 

+a log m + {1 — a) \og{l — a) — H{p) (14) 


which is independent of n. For the remaining cases 
a G R \ {0,1}, we check the monotonicity of 
(EEiP“)exp in X := ni-“. This expres¬ 

sion is of the form f{x) := {a' + b'x)/{c' + d'x), with 
a' = ET=i (?* - b' = a“mi-“, c' = (1 - a)“, and 
d' = a“. We have f'{x) > 0 if and only if a'd' < b'c', which 
is equivalent to 


1 — a 
sgn(a 


-Ha 


1-0 


< (1 — a) logm. 


According to 0, this inequality is true for 0 < a < 1, but 
the inequality sign is reversed for a < 0 and a > 1. Taking 
care of the signs in all the different cases of a proves (HI. 
By direct calculation, the large-n limit of a 1“) evaluates to 


lim A(“) 

n-s-oo " 


-logm - Ha{p) 
logm - Ha{p) 
expression (Eli above 


if a G [— 00 , 0) 
if a G (0,1) 
if a = 1 

if a G (1, + 00 ] 
(15) 


which is discontinuous at a = 0 and a = 1. 

So far, a G (0, m • min^ gi) was arbitrary; now we are going 
to fix the value of a in such a way that the limit in (fTSl l is 
everywhere strictly positive. To this end, set 


faia) := Ha 


/ / g, - a/m \ 

VV 1 -a J 


Ha{p) (aG[l,+oo]), 
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and observe that this expression is decreasing in a (for every 
fixed a S [l,+oo]), as long as a € [0, m ■ minimi]. This 
follows from the fact that for a, 6 in that interval with a < b, 
the probability distribution [(^j — b/m) /(I — b)]i majorizes the 
probability distribution [{qi — a/m)/(l — a)]i, and the Renyi 
entropies Ha with a > 1 are Schur-concave ID, ns. 

Choose j G N large enough such that l/(j' + 1) < m ■ 
miniqi, and for all n G N, set /„(«) := /i/(„+j)(a). Then 
every /„ is a continuous real function on I := [l,+oo], and 
the monotonicity of fa in a becomes fn{ct) < /n+i(a) for 
all a G I. Furthermore, 

lim fn{a) = lim fa{a) = Ha{q) - Ha{p) > 0 
n^oo a\0 

for all a € I. Thus, Lemma |6]below proves that there is some 
N G N such that /„(«) > 0 for all n > and all a G /; 
in other words, there is some a' G (0,m • miniqi) such that 
/a(Qf) > 0 for all 0 < a < a' and all a G I. Due to ([I3]l 
and (fTsT l. we thus obtain 

> lim = /4a) > 0 

71—^00 

for all a G (1,+cxd], a G [0,a'], and all n G N (recall that 
a 1“^ depends on the choice of a). Due to (O, we have 
limaN^o = H{q) — H{p) > 0, so there exists a G (0, a') 
such that a£2;1 > 0 for this choice of a. We now fix this 
value of a for all that follows. Due to continuity, there exists 
c > 0 such that > 0 for all a G [1 — e, 1]. According 

to (foi l. this implies that > 0 for all a G [1 — e, 1] and 
all n G N. In summary, we have achieved that 

A4^ > 0 for all n G N, a G [1 — £, +oo]. (16) 

Next we consider a G (0,1 — c). Since = 0 for all n is 
not useful as a special case, we define another quantity 

^Al“) ifaGR\{0} 

HBmgiqAs) — HBmg{pA® qs) if a = 0. 

The resulting quantity is continuous in a, also at a = 0 
due to (|6ll. Using that iTsurg .) ^ — log to, it is 

straightforward to check that 

Aa( 0) = ^ ~ ^ 1 ~ ^ n 

dn ^ (n + l)2 

hence Ai°^ is strictly increasing in n. The large-n limit is 
lim A^°l = -HbuibIp) - logTO > 0 

n—foo 

since p is not the uniform distribution. Considering only a G 
[0,1 — e], the Ara°^^ are an increasing sequence of continuous 
functions on this compact interval, converging pointwise to a 
strictly positive continuous function due to ( [13] ). ( fTSl l. and (|6]). 
Thus, Lemma |6] below proves that there exists some N' G N 
such that A1“^ > 0 for all n > N' and a G [0,1 — e], hence 

A4^>0 foralln> A', aG (0,1-e]. (17) 

Now we come to the case a < 0. According to in]) and dj), 
there exists N” G N such that for all n > N", it holds 
= —logTO — Hoo(p) > 0. Due to continuity, there 


is some a_ G M such that A^2 > 0 for all a G [—oo,a_], 
and thus (again due to ITtI )) 

A4^ > 0 for all n > A", aG[—oo,a_]. (18) 

Finally we treat the range a G (a_,0). Arguing as above, 
the Ai“^ are an increasing sequence of continuous functions 
on the compact interval [a_,0], converging pointwise to a 
strictly positive continuous function. According to Lemma |6] 
below, there exists some A"' G N such that Ai“^ > 0 for all 
n > A"', and thus 

A1“)>0 for all n > A'", a G [a_,0). (19) 

Combining IT^ . ITbb . ITtI ). ITSl ). and ( [T9b . and setting A := 
max{A(a), A', A", A"'}, we get 

Ha(pA ® <7b) < HaiqAs) fof all a G M \ {0}, and 
HBnrg{PA® qs) < HBurgipAs) for all n> N. 

Clearly (pa^Qb)^ 4 Qab’ because otherwise we would have 
H{pA®qB) = H{qAB)- Furthermore, qAB has full rank. Thus, 
Lemma |2| proves that pA ® qB >~t Qab- 

We have proven the statement of the lemma in the case that 
p has full rank. Now consider the case that rank(p) < to. 
Since q and thus qAB has full rank, we only have to show 
that Ha{pA 0 qs) < Ha{qAB) for all a G (0, +cx)). To this 
end, we can simply repeat the proof above with a few small 
changes. First, the cases of Burg entropy and Renyi entropy 
for a < 0 can be ignored. Second, the proof of ITbl ) remains 
valid, but the proof of ITtI ) has to be changed; instead of 
A1“^, we have to consider the quantity Ai“^ directly, which 
now satisfies Al°^ = log to — Hq{p) > 0 for all n. The rest of 
the argumentation remains unchanged, proving the statement 
of the lemma also for the case that p does not have full rank. 

■ 

The previous two lemmas have made use of the following 
basic result, which is a simple consequence of Dini’s theorem. 

Lemma 6: Let —oo < a < b < +oo, and {fn)n£fi a family 
of continuous real functions on / := [a, &]. (If b = +oo 
we demand that every /„ is continuous on [a, +oo) and 
that the limit /n(+oo) := lima;_>+oo/n(a;) exists for all n; 
analogously for the case a = —oo). Suppose that the family 
of functions is increasing, i.e. fn{x) < fn+i{x) for all x G I, 
and that lim„_^oo/n(a;) = f{x) for some continuous strictly 
positive function / : J — R. Then there is some A G N such 
that fn{x) > 0 for all n > A and all x G I. 

Proof: If either a = —oo or b — +oo (or both), we can 
consider the functions fn{y) ■= /n(tanq) for y G arctan/ = 
[arctan a, arctan6] C [—7r/2,7r/2] instead of the /„, and in 
this way reduce everything to the case that / C K. But in this 
case, Dini’s theorem proves that the convergence /n ^ / is 
uniform, hence with e := min^jg/ f{x) > 0 there is some A G 
N such that |/(a;) — fn{x)\ < e/2 and therefore fn{x) > 0 
for all a; G / and n > N. ■ 

Combining Lemmas |4| and |5] yields a first formulation of 
our main result. 

Lemma 7: Let p,q G R™ be probability distributions such 
that q has full rank. If H{p) < H{q) then there exists k G 
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N (in fact, we can always choose k = 3) and a fc-partite 
distribution ri. 2 ,...,fc with marginals ri, r 2 ,..., such that 

p (gi (ri (g) r2 0 ... 0 rfc) ^ g (g ri,2,...,fe- 

Proof: The special case that q = is trivial; 

in this case p q, and we can simply set A: = 0 (no auxiliary 
system), or alternatively A: = 1 with an arbitrary auxiliary 
distribution. 

So suppose q We first apply Lemma E] to 

conclude that there exists some extension qAB <1 — QA such 
HaiPA 'Si qs) < Ha{qAB) for all a e [l,+oo]. Clearly the 
extension qAB given in that lemma has full rank, but is not a 
uniform distribution. Therefore, we can apply Lemma |5] to the 
two distributions pA S qB and qAB, and obtain the existence 
of an extension qABC (introducing a third system C) of qAB 
such that 

(PA SqB) S qc gr Qabc- 

By definition of trumping, there is an additional system D and 
a catalyst (probability distribution) cd on D such that 

PaS qB S qc S CD >- qABC S cd- 

Since the majorization relation is preserved under the tensor 
product with another probability distribution, we obtain 

Pa S qB S qc S CD S qs >- Qabc S cd S qs, 

where qE = q — qA another copy of q (note however that 
qB and qc are in general not copies of g = g^). Swapping 
systems A and E on the right-hand side does not alter the 
probability values and the majorization order, thus 

PA S (qe S qB S qc S Cd) > qAS (qEBC S cd)- 

If we regard CD as a single system (which we may, since the 
marginal of qEBC 0 cd on CD is qc 0 cd), we see that we 
have k = 3 subsystems in addition to system A. ■ 

Now we are ready to prove our main result. Theorem [T] 

Proof: Suppose there exists an auxiliary distribution 
ci, 2 ....,fe with the stated properties. Then we can apply additiv¬ 
ity and subadditivity m, EH as well as Schur concavity 12 
of the Renyi entropies of orders a = 0 and a = 1 (Hartley 
and Shannon entropy) and obtain 

k 

Haip)+'^Ha{ri) < Ha{q) + Ha{ri^2,...,k) 

i^l 

k 

< Ha{q)+'^Ha{ri). 

i=l 

Since Hq{p) = logrank(p), this shows that rank(p) < 
rank(g). For Shannon entropy H = Hi, we obtain equality in 
the second inequality of this expression (subadditivity) if and 
only if ri 2 ,...,fe = fi 0 r 2 0 ... 0 this follows inductively 
from the fact that the mutual information of two random 
variables is zero if and only if the joint bipartite probability 
distribution factorizes ll22l . So if we had H{p) = H{q) 
then p 0 (ri 0 r 2 0 ... 0 Tfe) >- g 0 (ri 0 r 2 0 ... 0 r^), or 
p >~T g- But then Lemma |2] (possibly after removing common 
zeros from p and g as in the following paragraph below) would 
prove that H{p) < H{q), which is a contradiction. 


Conversely, suppose that p,q G M*” are probability dis¬ 
tributions that are not equal up to permutation and satisfy 
rank(p) < rank(g) and H{p) < H{q). Without loss of 
generality we may assume that p^ = p and q^ = g, i.e. that the 
entries of p and g are in descending order. Let £ := rank(g), 
then £ < m and g = g © Om-i, where g = {qi,... ,qf) C 
has full rank, and Om-r = (0,..., 0) G is the zero 

vector of dimension m — £. Since rank(p) < rank(g) = £, 
we can also write p = p (B Om-e, where p G does 
not necessarily have full rank. Then (12) for some probability 
distribution ri 2 ,...,fc is equivalent to 

p 0 (ri 0 r 2 0 ... 0 Tfe) g g 0 ri_ 2 .....fe. 

Since H{p) = H{p) < H{q) = H{q), and since g has 
full rank. Lemma |7] applies and shows that a probability 
distribution ri. 2 ,...,fc exists that satisfies this relation. ■ 

Similarly as for catalytic majorization ii, it is easy to show 
that auxiliary distributions which are either fully mixed 
(i.e. equal to (i,..., i) G M" for some n) or pure (i.e. 
contain only zeros and ones) are useless; they can be removed 
without altering the c-trumping relation. In other words, we 
may assume that every auxiliary system G K" appearing 
in (12 has Shannon entropy strictly positive and strictly less 
than log n. 

IV. Proof of Corollary12 

Proof: It is obvious that every function S : A+ —R 
of the form (12 is continuous and has properties (i), (ii) and 
(iii). It remains to show that converse; so suppose that S is 
a continuous real function on A+ that has properties (i), (ii), 
and (iii). Use the notation 

6 A+ (nGN\{l}), 

\n n J 

and define the “negentropies” for all p G A+, to G N, as 

/(p) := H{r]m) - H{p) ^\ogm-H{p), 

J{p) := S{r]m)-S{p). (20) 

We claim that J is non-negative. This can be seen from a 
simple argument which, for notational reasons, we give only 
for TO = 3, but which obviously works for all to. Using 
additivity, symmetry, and subadditivity (recalling our matrix 
notation for bipartite distributions), we obtain 

/ Pi/3 P 2/3 P 3/3 \ 

S{r]3) + S{p) = S{p3Sp)=s\ pi/3 P 2/3 P 3/3 

\ Pi/3 P 2/3 P 3/3 / 
/ Pi/3 P 2/3 P 3/3 \ 

= S P 2/3 P 3/3 pi/3 

V P 3/3 pi/3 P 2/3 J 

< S{p3) + S{p3), 

hence S{p) < S^ps), and in general S{p) < S{pm) for all 
p G A+ by the same argument. 

We will now show that S is Schur-concave. Suppose that 
r, s G A+ satisfy r © s (implying in particular that these 
distributions have the same number of entries). Then, for every 
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e > 0, there is a distribution with ||s — SgH < e and a 
permutation tt^b on a bipartite system AB such that 

That is, s can be obtained to arbitrary accuracy by bringing 
in an extra system i? in a uniform distribution, performing a 
suitable global permutation, and restricting to the marginal on 
A. This fact has been used extensively in quantum thermody¬ 
namics ll23l , ifTSl , Thus 

S{rA) + S{rjB) = S{rA®riB) = S [TTABirA^TlB)) 

< S {[TrAB(rA®r]B)]A) + S {[...]b) 

< S{se) + S{r]B). 

By continuity, it follows that S{r) < S{s), that is, Schur- 
concavity. 

We claim that for all p,q € A+, 

(21) 

To see this, suppose that p G A+ and q G A+ with I(p) > 
I(q). If q = rjn then J{q) = 0 < J{p) as claimed. Otherwise, 
for every e € (0,1), define q^ := (1 — e)q + erjn, then 

H(p<^'qn) < H{q^r]m) < H{q^'^r]m)- 

Thus, according to Theorem[Tl for every e G (0,1) there exists 
some tripartite distribution C123 such that 

P®rin®Ci®C2®C3>-qe®rim® Ci23- 

Using (ii), (iii), and Schur-concavity, we get 

S{p) + Sijqn) + S{ci) A S{C2) + S{c^) 

= S'(p(g)p„ (g)Ci (g)C2 003) 

< (g) ?7m (g) C123) 

= S{qf)-\-S{r]m) + S{ci2‘3) 

< S{qe) + Si'qrn) + S{ci) A S{C2) + S^Cs). 

Therefore J{p) > J{qe), and by continuity J{p) > J{q)- 
This proves (I 2 TI 1 . If I{p) = I{q) then we have (l2lT i in both 
directions, hence J(p) = J{q). Thus there is a function / : 
[0,oo) -G K with /(O) = 0 such that J(p) = f{I{p)) for all 
p G A+. According to (l2Tli . this function / is increasing. If 
x,y > 0, let p,q G A+ be distributions with I{p) = x and 
I{q) = y, then 

f{x + y) = f {lip) + I{q)) = f{I{.p®q)) = J{.p<^q) 

= J{p) + J{q) = f{I(p))Af{I{q)) 

= f{x) + f{y)- 

Thus, / is an additive monotone function, and it is well-known 
(and easy to check) that all functions of this kind are linear. 
Hence there is a constant c S M such that J{p) = c ■ I{p), 
and this constant cannot be negative due to (ISTT i. Recalling the 
dehnition (l20l i. we get for p G A+ 

S{p) =c- H{p) + S{r]m) - clogm 

^^ 

—'Crn 

and from rjmn = Vm ® Vn is is easy to check that Cmn = 
Cm T Cn- I 


A few comments are in place regarding the statement of 
this corollary. Note that the additivity property Cmn = Cm + Cn 
for the dimension-dependent constants does not automatically 
imply that Cn = b ■ log n for some constant b G M.. While 
this is a possible choice of c„, there are other choices, and 
one needs additional assumptions to conclude that c„ is a 
logarithm, cf. Il24l . 

It is well-known that Hartley entropy Hq is symmetric, 
additive, and subadditive. However, if p G A+, i.e. p does not 
contain zeros, then Hq{p) = logn, i.e. a dimension-dependent 
constant, which is covered by our theorem. 

From the structure of the proof, one can conclude that 
the actual mathematically “natural” quantity is not Shannon 
entropy H itself, but negentropy I{p) := logn — H{p) 
(for p G A+). This resembles the fact that I (and not H) 
turns out to be the relevant quantity to describe the amount 
of extractable work in many situations in thermodynamics, 
cf. ED, QSl. 

Note that the Renyi entropies Ha and the Burg entropy 
^^Burg are continuous, symmetric, and additive, and so are 
non-negative (discrete or continuous) linear combinations of 
them. It is therefore natural to conjecture that these are the only 
real functions on A+ that satisfy the analog of Corollary [3 if 
the assumption of subadditivity (ii) is dropped. This conjecture 
resembles Example 7.10 in IIT 2 II . However, it is not clear 
whether the methods of this paper allow to contribute in any 
way to a resolution of this conjecture. 

V. Conclusions 

We have introduced a new relation on the finite discrete 
probability distributions, called c-trumping, which is part of a 
series of natural generalizations of the notions of majorization 
and trumping as studied in quantum information theory. It 
is meant to elucidate the relation between correlation and 
disorder, and turns out to be completely characterized by 
Shannon entropy H. We have also shown that this insight can 
be used to obtain a very simple proof of a weaker version of 
Aczel et al.’s characterization result ifTOll . 

It has been noted before that the notion of trumping, or 
catalysis, is very sensitive to the detailed requirements on how 
the catalysts are retained in the end. For example, if (01 is 
replaced by the weaker condition that p®r>~q®r', where 
r' is e-close in variation distance to r for some fixed e > 
0, then all transitions from any p to any q become possible, 
and the resulting relation becomes trivial. This phenomenon 
has been called embezzling in the context of entanglement 
theory ll^ and thermodynamics 01. If one demands that the 
variation distance is smaller than e divided by the logarithm 
of the catalyst dimension, then it turns out that the Shannon 
entropy H determines the allowed transitions 01, which is 
somewhat similar to our result. 

So can Theorem [T] be interpreted as an instance of em¬ 
bezzling? We do not think so. Note that we demand that 
the auxiliary systems ri,..., preserve their local states 
exactly. More generally, while it has been argued in 0] that 
“closeness in variation distance” is simply not a physically 
meaningful requirement, we think that “local preservation 
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of the auxiliary distributions” is a physically well-motivated 
condition: restrictions on transformations in physics usually 
arise from conservation laws. But in most situations, conserved 
quantities (like energy or angular momentum) are sums of 
local quantities as long as interaction terms can be neglected. 
In this sense, our result says in what way we can exploit 
auxiliary systems as resources, if these systems are forced to 
preserve their local states due to local conservation laws. 

If local states are allowed to change, then physical intu¬ 
ition expects these systems to thermalize; in the context of 
majorization, this amounts to getting closer to the uniform 
distribution. This paper can be interpreted as studying the 
complementary situation in which local states are forced to 
be fixed. Theorem [T| then gives a classification of what is 
possible in this regime, and suggests that there might be some 
situations of this kind in physics where correlations build up 
spontaneously. 

The c-trumping relation represents a special instance of 
a more general problem: instead of asking whether a given 
distribution p can be transformed into another distribution q 
by some bistochastic map, we can ask whether this is possible 
if some additional resources are consumed or produced during 
the transformation. 

More formally, think of some set of input auxiliary distri¬ 
butions I, and to every r G I a corresponding set of output 
distributions Or- We may then ask whether there exist auxil¬ 
iary distributions r G T and r' G Or such that p (g> r >- q®r'. 
If r is in some sense “more valuable” than r', then the 
transition p ^ q can be accomplished at the cost of some 
auxiliary resource; otherwise we have a resource yield. While 
this formulation represents a simplification of the general idea 
of a resource theory ||9|, ifT^ . it may already lead to non¬ 
trivial but mathematically tractable relations on probability 
distributions, with in some cases interesting consequences for 
thermodynamics. 

In the case of c-trumping, I is the set of product distri¬ 
butions, Or is the set of multipartite distributions that have 
the same marginals as r G I, and transitions involve a cost 
of stochastic independence. A different example is given by 
the notion of lambda-majorization that has been introduced 
in ifOl to calculate the work cost of arbitrary processes such 
as Landauer erasure. They study transitions of the form 




(n-i) 




0 X 


0{n-j) 

2 


via bistochastic maps, where X 2 = (1,0) is a “pure bit”, and 
772 = ( 5 , 5 )- Given p and q of identical size, they ask for 
the maximal X := i — j over all n G N (arbitrary number of 
“auxiliary bits”) such that a transition of this form is possible, 
i.e. the left-hand side majorizes the right-hand side. This is 
interpreted as extraction of work proportional to A by resorting 
to Landauer’s principle. In our formalism, we can fix A G Z, 
define I as the set of all distributions := pf* 0 xf’’ 
with arbitrary k, I G No, k > X, and Ok,i as the set of 
all distributions of the form 0 This way, 

our formalism expresses the question whether work extraction 
proportional to the given value of A is possible. 

As the results in this paper indicate, the study of generalized 
majorization relations of this kind may lead to surprising in¬ 


sights into the “usefulness” of information-theoretic properties. 
This contributes to the general question how different kinds 
of knowledge (represented by probability distributions) can 
be “put to work” via interconversion, and in what way this 
is expressed by the values of entropy-like quantities. Clearly, 
this kind of reasoning is not restricted to classical probability 
distributions, but can applied to quantum states as well, which 
are the main subject of interest in quantum thermodynamics. 
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